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Abstract 

An increasing number of scholars are using longitudinal social net- 
work data to try to obtain estimates of peer or social influence ef- 
fects. These data may provide additional statistical leverage, but they 
can introduce new inferential problems. In particular, while the con- 
founding effects of homophily in friendship formation are widely ap- 
preciated, homophily in friendship retention may also confound causal 
estimates of social influence in longitudinal network data. We pro- 
vide evidence for this claim in a Monte Carlo analysis of the statisti- 
cal model used by Christakis, Fowler, and their colleagues in numer- 
ous articles estimating "contagion" effects in social networks. Our 
results indicate that homophily in friendship retention induces signif- 
icant upward bias and decreased coverage levels in the Christakis and 
Fowler model if there is non-negligible friendship attrition over time. 

We thank Nicholas Christakis and James Fowler for generously sharing the simulation code 
adapted in this paper and for helpful comments and clarifications. We are also grateful 
to Megan Andrew, Jason Fletcher, Rob Franzese, Michael Heaney, Jonathan Ladd, Russell 
Lyons, Aya Kachi, Walter Mebane, Jacob Montgomery David Nickerson, Edward Norton, 
Phil Paolino, Fabio Rojas, Betsy Sinclair and two anonymous reviewers for helpful sugges- 
tions and feedback. 



1 Introduction 



Until recently, scholars have given relatively little attention to the influ- 
ence of personal relationships on human behavior, instead studying peo- 
ple largely as atomistic individuals ripped from the social context in which 
they live. Thankfully, this impoverished approach has started to give way 
to an interdisciplinary movement seeking to understand the influence of 
social networks in domains ranging from health to politics. Results from 
cases in which peers were randomly or quasi-randomly assigned such as 
college roommates have provided credible evidence of such effects (e.g., 
Sacerdote 2001 Zimmerman||2003} |Carrell, Hoekstra and West||2011^ for a 



Kremer and Leyy^OO g)^] 



review, see 

However, when random assignment of peers is not feasible, researchers 
must use observational data, which creates serious inferential problems 
( |Manski|1993| . In particular, peers may behave similarly as a result of "cor- 
related effects" such as common environmental shocks or shared charac- 
teristics rather than social influence. Given the likelihood that peers will be 
similar on a range of characteristics due to homophily ( |McPherson, Sm ith- 



Lovin and Cook|2001 1, distinguishing between homophily and peer effects 



has proven to be a very difficult challenge]^] 

Many scholars have therefore turned to use longitudinal network data 
to try to separate homophily from social influence effects. In principle, ob- 
serving dyads over multiple periods seems as though it could help separate 
homophily in tie formation from subsequent peer influence. However, ho- 
mophily may also affect whether social ties are maintained over time, con- 
founding estimates of peer influence effects. We call this the "unfriending" 
problem in honor of the Facebook practice of removing a person from one's 
list of friends on the online social network site. 

We illustrate the potential inferential consequences of this problem be- 
low in an analysis of the statistical model used to estimate "contagion" ef- 



1 Related experimental studies by 



Nickerson 



2008 and 



Fowler and Christakis 



(20101 



provide suggestive evidence that such effects can extend two or more degrees, though the 
applicability of their results to non-exp erimental conte xts is unclear. 



2 For a review of the literature, see 



Soetevent 



20061. 
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fects in a series of widely publicized studies by Christakis, Fowler, and their 
colleagues (hereafter CF). Our Monte Carlo simulations, which are adapted 
from those of CF, indicate that their model's estimates of social influence ef- 
fects are unbiased and have accurate coverage levels when homophily in 
friendship retention is not present. However, when non-negligible attrition 
is present, estimates from the CF model show substantial upward bias and 
decreased coverage levels as homophily in friendship retention increases. 
In short, the "unfriending" problem can create spurious evidence of social 
influence when none exists. 



2 Leveraging dynamic networks: A solution? 

The CF studies, which use longitudinal social network data from the Fram- 
ingham Heart Study (FHS) and National Longitudinal Study of Adolescent 
Health (Add Health), make strong claims about the effects of one's friends^] 
on a wide range of dependent variables: obesity ( Christakis and Fowler | 



2007||Fowler and Christakis|2008fr[ ), smoking ( |Christakis and Fowler|2 008 ), 



happiness ( Fowler and Christakis|2008a) , loneliness (Cacioppo, Fowler and 



Christakis|2009| >, depression ( Rosenquist et al.|2010[ >, alcohol consumption 



( [Rosenquist, Fowler and Christakis||2010| , sleep loss (Mednick, Christakis 



and Fowler||2010) , and divorce (McDermott, Fowler and Christakis N.d. (. 



Each CF paper uses the same approach, estimating versions of the follow- 
ing model for ego i and alter observed at times to and t\\ 

y iM = f( Y i,k' *j,to' Y ),h' controls) (1) 

These models are typically estimated using generalized estimating equa- 
tions (GEE) with an independent correlation structure to account for re- 
peated observations of the same ego (specifically, those who name or are 
named by more than one friend) and dyad (those who name each other 
and are thus included twice, one each as the ego and once as the alter). The 



3 The studies typically also estimate social influence effects among family members; we 
do not consider the validity of those estimates here. 
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functional form of the model varies depending on the distribution of the 
dependent variable (logistic regression if the dependent variable is binary; 
linear regression if it is continuous or quasi-continuous)]^] 

CF argue that this specification controls for initial homophily (i.e., the 
likely similarity between Y, /fo and Y/,t ), allowing us to identify the causal 
effect of changes in the alter 's trait from to to t\ by estimating the effect of 
Y / fl controlling for Yj /to . In Christakis and Fowler < |2007) , they write that 
including alters' lagged obesity as a covariate "controlled for homophily" 
(373). In later work, the language is somewhat more hedged — for instance, 
they write in Christakis and Fowler (2008 , 2251) that a lagged measure of 
alter smoking "helped to account for homophily" (our emphasis) — but the 
suggestion that the coefficient for Y, /tl is a causal estimate of peer effects 
Christakis and Fowler ( 2009) 1 expands on these claims, stating 



remains. 



that observed clustering at up to three degrees of separation reflects "Three 
Degrees of Influence" for happiness (51), obesity (108), and smoking (116) 
and asserting that we "now know that obesity is contagious" (111). 

Cohen-Cole and Fletcher] (2008a|&) and |Halliday and Kwak) ( |2009] > ques- 



tion whether CF's model adequately controls for homophily, which has 



been shown to be significant for weight status (Trogdon, Nonnemakera 


and Paisa 


2008. 


Halliday and Kwak 


2009. 


Valente et al. 


2009'p 


and sug- 


gest that their model may generate spurious inferences (see also ! 


Shalizi and 



Thomas 2011 Lyons N.d. and Ellen 2009) p] In response, CF describe Monte 



Carlo simulation results "documenting that homophily (ranging from no 
homophily to complete homophily) does not result in bias in the estimates 
of induction in this model specification" (Fowler and Christakis 2008b, 1404). 



4 For other approaches to obtaining causal estimates of peer effects in longitudinal or 
repeatedly sampled network data, see Anagnostopoulos and Mahdian (2008} , 
Djebbaria and Fortin| (2009) , |Aral, Muchnika and Sundararajana " (2009 " and' 
12010k. 



Bramoulle, 



,azer et al. 



e la Haye et al.| |2010 l finds homophily in obesity-related behaviors as well. For further 
explorations of possible social transmission of obesity or weight sta tus, see|Anderson (N.d. >, 
Barnes, Smith and Yoder|(N.d.|,|Brown, Hole and Roberts| (NxL) , |McFerran et al.|(2010fl <, 



McFerran et al. 



(2010b) , andJCampbell a nd Mohr|(NxL 



"There are other concerns with this statistical model such as possible simultaneity bias 
and environmental confounding that we do not discuss here. 
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CF's Monte Carlo results, which are presented in Fowler and Christakis 



(2008&) and in a very similar form in |Fowler et al.| | |2011[ >, are derived from a 



stylized model in which a population of individuals with a randomly cho- 
sen value on some characteristic of interest form friendships and then influ- 
ence each other or not (we discuss the procedure in more detail below). CF 
find that estimates of this influence coefficient are approximately unbiased 
across varying levels of homophily when the true peer effect is and have 
a slight downward bias when the true peer effect is 0.1. On this basis, they 
conclude that "This simulation evidence suggests that the [Cohen-Cole and 
Fletcher] assertion that homophily causes us to overestimate the size of the 
induction effect is false." However, as we discuss below, their simulation 
does not incorporate friendship attrition and thus fails to fully account for 
the effects of homophily. 

3 The "unfriending" problem in longitudinal data 

Due to the prevalence of cross-sectional data and interest in fixed character- 
istics such as race and sex, scholars of social networks have tended to think 
about homophily in relatively static terms and to analyze it as a propen- 
sity to form ties with others who share similar characteristics. However, 
social networks are actually the result of a dynamic process of friendship 
formation and dissolution. 

As a result, while relatively few longitudinal network studies have been 
conducted, most report substantial levels of friendship dissolution between 
survey waves. For instance, Mollenhorst ( |2009| l finds that about half of 



adult friendships were replaced over the seven years between the two waves 
of his survey. For the adolescents in Add Health, Moody ( |N.d.[ > found ap- 



proximately half of the friends named by respondents during in-school in- 
terviews were named again during in-home interviews six to eight months 
later. Studies of social networks among younger children have found rates 
of attrition that are even higher still ( Hallinan and Williams| 1987 Cairns 



and Cairnsfl l995). A partial exception is the FHS data used by CF, which 



was conducted in a relatively stable community. O'Malley and Christakis 
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(N.d.) report that 82% of friendships were maintained between waves in 
FHS, which could be a result of asking for "close friends" who could help 
the researchers contact the participant in the future]^] 

When friendship attrition is present, homophily is likely to be a factor. 
Just as people who are similar are more likely to be friends, friends who 
are less similar are more likely to stop being friends. Most of us have had 
friends from whom we have grown apart in this way. As we have less in 
common with those people, we stop spending time with them and eventu- 
ally fall out of touch. In some cases, one person may deliberately end the 
relationship as a result of differences in political views, alcohol consump- 
tion, or other behaviors or characteristics. 

Numerous examples of homophily in tie dissolution have been doc- 
umented in the sociology literature (see Burt||2000 and |McPherson, Sm ith- 



Lovin and Cook 2001| for reviews). One well-known example is a two-wave 



study of adolescent friendships by Kandel ( 1978| >. She describes homophily 



in friendship retention based on both initial characteristics and subsequent 
behavior (430): 

At time 1, prior to any subsequent change, pairs that will remain 
stable over time are much more similar in their behaviors and 
values [marijuana use, educational goals, political views, and 
delinquency] than the subsequently unstable pairs... At time 
2, homophily among former friends is lower than among new 
friendship pairs or stable pairs. 

She interprets these results as a combination of selection (choosing to be- 
come and stay friends with those who are like you) and socialization (acting 
more like your friends in those relationships you maintain) (433^35): 

The results support the general conclusion that adolescents co- 
ordinate their choices of friends and their behaviors, in particu- 



7 CF report that they treated friendship ties as maintained when a friend as named at t\ 
and t$ but tie information was missing at (personal communication). Under this defini- 
tion, 96% of friendship ties were maintained between waves. However, since missing tie 
information was often the result of missing an exam altogether, friendship retention in their 
statistical analyses is likely to be lower. 
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lar the use of marijuana, so as to maximize congruency within 
the friendship pair. If there is a state of unbalance such that 
the friend's attitude or behavior is incongruent with the ado- 
lescent's, the adolescent will either break off the friendship and 
seek another friend or will keep the friend and modify his own 
drug behavior. 

Childhood and adolescent friendships have also been found to be more sta- 
ble when students are more alike by gender ( |Tuma and Hallinan|1979) De- 



girmencioglu et aL||1998 MoodyflN-d-l l, race | Moody|[N.d.| >, grade (Moody 



N.d.fr , achievement/ competence ( |Tuma and Hallinan 1979 Newcomb, Bukfo vski 
and Bagwell 1999 1, and aggression ( |Newcomb, Bukowski a nd Bagwell 1999}. 

Similar patterns have been observed among adults. For instance, Kossin^ ts 
and Watts] (2009) find that dyads who are more similar demographically 
are less likely to break ties in the email network of a large American uni- 



versity (433-434); Popielarz and McPherson ( 1995} show that members of 
voluntary groups who are dissimilar from other members are more likely to 
leave; and |Burt ( |2000 1 documents homophily in the maintenance and dis- 
solution of bankers' collegial relationships along several dimensions. Most 
notably, |Q'Malley and Ch ristakis ( |N.d.[ > document homophily in friend- 
ship retention within the FHS data used in almost all of CF's studies. They 
find that differences in BMI, smoking, and measures of body type are sig- 
nificantly related to the dissolution of friendship ties. 

These patterns of homophily in unfriending are potentially a problem 
for any statistical analysis of peer effects in observational data. Social net- 
work scholars have been concerned for some time about the difficult of 
separating homophily in friendship formation from contextual influences 
and peer effects. However, homophily in friendship retention is an equally 
difficult problem. In particular, unfriending is a significant concern for 
the CF approach. The specification of their generalized estimating equa- 
tion models requires an ego to name an alter as a friend for two or more 
consecutive waves (although see footnote [7] above). In this way, they at- 
tempt to leverage longitudinal network data to control for past friendship 
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ties. However, what happens when some of the dyads at fo are no longer 
friends at t{l Fowler and Christakis argue that including such friendship 
pairs in their data will bias the results against finding an effect because "it 
essentially adds 'random' non-friend relationships (i.e., people who are no 
longer friends) to the pool of friends" ( Fowler and Christakis|2008fo 1401). 
This is a legitimate issue; non-friends presumably can no longer influence 
the person in question. 

However, the friendships that have been terminated may not have be 
"random." Relationships often end for a reason. If the reason for friend- 
ship termination is related to or is correlated with changes between to and 
t\ in the underlying trait we are examining, it will induce an association 
between Y, tl and Yj /tl that is not captured by the lagged values of the vari- 
ables in question^ In the CF model, the coefficient for Y, fl is interpreted 
as a causal effect. As such, the association induced by homophily in the 
unfriending process could create the appearance of an influence effect even 
if none existsjf] 



4 Monte Carlo simulation procedure 

To determine the extent to which homophily in friendship retention might 
lead to spurious inferences under CF's model, we conduct Monte Carlo 
simulations in which we do not allow for social influence of alters on egosj^j 

8 The threat of non-random unfriending can be considered a specific example of the prob- 
lem noted by Nickerson (Fowler et al.|20TT| in his discussion of the value of conducting 
experiments on static networks: "it is always possible that measurement of the network 
post-treatment could be correlated with the provision of the treatment. If treatments cause 
certain relationships to become more salient or networks to change composition, then many 
strategies for defining networks . . . may cease to be equivalent for treatment and control 
groups." 

9 In principle, one might attempt to model the selection process by which friendships are 
maintained in order to recover the true value of the influence coefficient. However, it seems 
impossible to obtain data that is granular enough to separate stochastic changes in the trait 
of interest from fo to t\ from peer effects. In the absence of such data, accurately modeling 
the friendship retention process requires knowing the value of Yjf that would have been 
observed if no influence had taken place — an unobserved counterfactual. 

10 A broader question that we do not engage here is whether statistical models of such 
effects are formally identified. Shalizi and Thomas (2011} presents a graphical causal model 
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In reality, of course, such influence seems likely to be common. However, 
we follow standard practice in Monte Carlo evaluation of statistical models 
in assuming that the coefficient in question is zero and estimating the bias 



and coverage of the model. (The simulations reported in Fowler and Chris- 



takis (2008&) and Fowler et al. ( |2011| | also include estimates of bias in the 



model where the true influence effect is equal to 0.) By working within this 
framework, which offers well-defined standards of model performance^ 
we can evaluate the the risk that homophily in friendship retention will 
lead researchers using the CF model to falsely reject the null (the standard 
inferential approach used in applications of the model) f^j 

In this section, we explain the Monte Carlo simulation procedure used 



in our analysis, which is adapted from code used in Fowler and Christakis 



(2008&( and Fowler et al.| ( |20lT| >. The R code used to generate our results will 



be posted on the Social Networks website along with the electronic version 
of our article. 

The simulation proceeds as follows: 

1. A normally distributed trait Y to is randomly generated at time to for a 
population of n=1000 actors where Y to ~ N(50, W)^\ 

2. Differences in Y are computed for all dyads of actors i and j in the 
same manner as CF: 



arguing that such effects are generically unidentified in observational data for a person i 
when some latent trait "X, directly influences Y; t for all t." In such cases, even controlling 
for Y; /t _i and Yjj-i is not sufficient to identify the causal effect of a network tie Ay on Y; f . 

11 By contrast, it is rare to test a model with a non-zero null hypothesis using frequentist 
statistics and it it not entirely clear how such a model should be evaluated. 

12 These Type I errors could happen in a variety of ways. First, many researchers test 
hypotheses for which we have weak priors and a null hypothesis of zero may be a valid 
initial assumption. In such cases, it is important to know whether CF's model may generate 
spurious findings. Moreover, even if we have strong priors that social influence is taking 
place, we can never be sure that the expected effect is present in the data we are analyzing. 
It is important to know whether the CF approach could generate a spurious positive effect 
in t hose cases. 

^Fowler and Christakis |2008fc draw their trait values from the empirical distribution of 



BMI in the Framingham data. Since we do not have access to those data, we use the initial 



distribution from the simulation in |Fowler et al. 1 2011| instead 
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The difference term is negatively valued so that dyads with similar 
traits have high values. 

3. Ties A{ are created as a function of du using a probit model based on 
a latent variable A*;. These ties are directed (i.e., A, ^ does not imply 



' ; ; \ if A? ; .<e !V ~N(0,l) 



where 



-4*,; = ?7o + J7idf, ; - 

?/o represents the baseline propensity to form ties and f/i represents 
the coefficient for homophily (positive values indicate higher levels 
of homophily). 

4. All actors receive a normally distributed, independent shock w, to 
their trait Y to where u t ~ N(O,5)0 

5. In this step, CF assume that ego traits Y tl are updated as a function of 
their previous trait value Yt Q and influence from their alters^] How- 
ever, since we assume there is no peer influence, ego traits at t\ are 

HFowler and Christakis 2008&} use shock values from the empirical distribution of 
changes in BMI in the Framingham data. Again, since we do not have those data, we use 
the shock distribution from the simulation in Fowler et al. {2011} instead. 



15 In CF's simulation, all egos' values of Y are updated as a weighted average of their own 



current value of Yf + w; and the average value of Yf + m, for their alters: 



where b\ is represents the relative influence of alters on egos. Since we set peer influence to 
0, our results are not sensitive to this assumption. 
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simply the sum of their previous value Yf and their shock 

H,ty = Yi,t + u i 

6. All actors update their friendship ties Aj r y(fi) as in steps[2jj3] 

7. Following CF, we estimate a linear regression model using general- 
ized estimating equations with an independent correlation structure 
for all dyads who are friends at to and t\\ 

Y Ul = jS + Y i/k Pi + Yj,t oJ 6 2 + Y]m fa + e where A hj (f , ti) = 1 (2) 

To illustrate the results, Figure [T] provides a sample network at the end 
of one simulation (here n=120 since networks of the size used in our simu- 
lations are too dense to parse visually). 

Figure 1: Sample network 




Sample network: t]o=l in step|6j ^i=0.05 in steps|3]and|6| s.d.(u)=5; rjo=-2.5; n=120 (isolates 

not displayed). 



The simulation process is illustrated in Figure Open squares rep- 
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resent values of Y for ego-alter pairs after initial friendships have been 
formed. There is some correlation due to homophily. The actors then both 
experience shocks to their values of the trait, which are represented by ar- 
rows. The new values of the trait are indicated by the circles at the end of 
each path. The GEE should estimate the effect of the alter 's shock on the 
ego controlling for the alter 's previous Y value. However, at the friendship 
retention stage, some of the pairs cease to be friends. Only those pairs in- 
dicated by the solid circles remain in the data; those that are open circles 
have ceased to be friends. 



Figure 2: Illustration of one simulation 

■ — i 

Step 1 : Randomly generate trait values 
Steps 2 and 3: Assign ego-alter pairs based on homphily 
Step 4: Random shocks to all trait values 
Step 5: Peer influence (if any) 
Step 6: Attrition based on new homophily 
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This procedure above modifies CF's original approach in two key ways. 
The most important modification is step |6j which repeats the friendship 
model from step |3j allowing for friendships to end based on homophily 
after a shock to Y. This step is crucial in longitudinal network data as dis- 
cussed above. While the shock u to Yq is assumed to be randomly dis- 
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tributed, some people may cease to be friends as a result of (or for reasons 
that are correlated with) the shock that they received, inducing a correlation 
in u for those dyads whose friendships persist. Our simulation is intended 
to test whether this correlation could appear to be a causal influence effect 
even if we control for lagged values of the trait. 

The way we model friendship formation and dissolution also differs 
from CF's approach. We introduce a latent variable probit model where a 
tie exists if a deterministic component (A** = rjo + tjid^j) is greater than a 
stochastic error term (e,; ~ N(0, 1)). By contrast, CF generate a probability 
of a tie that is a weighted average of Yq and a random component and then 
conduct a random draw with that probability to determine whether a tie ex- 
ists. This process combines two sources of random noise. The first is meant 
to model factors other than homophily in friendship choice, while the sec- 
ond models the inherently stochastic component of friendship formation. 
However, this partition is not readily interpretable — any unobservable in- 
fluence on the outcome variable in a statistical model can reasonably be 
included in the stochastic component if it is not also systematically related 
to the independent variables. 

The result of this double-randomness is that CF's simulations do not 
generate sufficiently high correlations in the outcome variable among friends 
even when ties are formed on the basis of "complete homophily" ( |Fowler 



and Christakis|20 08fr . 1405). When we replicated CF's simulations with ho- 



mophily set to its maximum value, the mean correlation between ego and 
alter on the outcome variable was 0.12 (2.5-97.5 percentile range: 0.05, 0.18). 
In practice, social network datasets often display higher levels of correla- 
tion among friends. For instance, Halliday and Kwak ( |2009[ > find a BMI 
correlation of 0.19 among adolescent peers in the Add Health data after ac- 
counting for school fixed effects. The correlation in reported vote choice 
between respondents to the 2000 American National Election Study and 
their friends ranged from 0.43 to 0.57 (results available upon request). Even 
accounting for the effects of projection (i.e., respondents falsely perceiving 
that their friends agree with them), these results suggest that simulations 
should consider higher levels of homophily. While some of these correla- 
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tions may be due to contagion, we cannot assume such an effect. As such, 
we designed our simulations to cover a wide range of homophily on the 
trait of interest. 



5 Monte Carlo results 

Following standard procedure in Monte Carlo evaluations of statistical mod- 
els, we set the true contagion effect (i.e., the parameter b\) to and estimate 
mean bias and coverage levels for the CF model. In our simulations, we 
set 7/0 to -2.5 at the friendship formation stage in order to generate realistic 
numbers of friendships at fo0 We then vary ?/i at both stages to gener- 
ate realistic levels of homophily in both friendship formation and reten- 
tion and also vary tjq at the friendship retention stage to consider different 
friendship attrition ratesf* 7 ] 

• Homophily in friendship formation: We vary the homophily param- 
eter 7/x in step[3]to cover five possible levels: 0.0, 0.0125, 0.025, 0.0375, 
and 0.05. 

• Homophily in friendship retention: We vary the homophily parame- 
ter ?/i in step|6]to cover three levels: 0.0, 0.025, and 0.05. 

• Levels of friendship retention: We vary 7/0 in step [6j considering val- 
ues of 0.0, 0.5, and 1.0 to represent realistic variation in attrition rates 
between to and t\. We also include the value of 1.85 to cover the max- 
imum FHS friendship retention rate of 0.96 (see footnote |7| above) pj 



16 CF's Framingham subjects typically only name one friend due to the nature of the 
instrument used. However, this structure is unusual and we do not mimic it here. (See 



Thomas and Blitzstein|N.cL] for a related discussion of how binary networks with censored 
outdegree information may generate inflated social influence effects.) 

17 In practice, it is possible that homophily in friendship formation and retention could 
generate feedback effects in a more elaborate multi-stage simulation (as, for instance, new, 
more similar acquaintances displace old, less similar friends), but we do not attempt to 
model the complexities of those potential dynamics here, especially given our lack of sub- 
stantive knowledge about how such a process would operate. 

18 Additional simulations in which we also vary the standard deviation of the shock u to 
Yq generate similar results. They are thus omitted but available on request. 
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These values correspond to realistic levels of homophily and friendship 
retention. For instance, O'Malley and Christakis ( |N.d.[ > use FHS data to 
test whether they observe homophily in friendship formation and dissolu- 
tion using BMI and related measures. Their hierarchical logit models find 
no effect for homophily at the formation stage, which would correspond to 
a rji value of 0.0 in step |3j but find a statistically significant and substan- 
tively large coefficient on BMI for friendship retention. While we cannot 
directly compare coefficient values given differences in data and estimation 
techniques, the value of 0.05 for r\\ in step [6] appears to be an appropriate 
comparison. 

Complete results from the Monte Carlo simulations, which were per- 
formed 1,000 times for each unique combination of model parameters, are 
presented in Table [l] at the end of the document. The first result of note 
is that our simulations cover realistic ranges of both ego-alter trait corre- 
lation and friendship attrition. As we discuss above, observed ego-alter 
trait correlations can be quite high. In the simulations, these correlations 
range from approximately 0.0 to 0.6 at to and 0.0 to 0.4 at t\. Similarly, the 
current simulation yields friendship retention rates of approximately 50% 
in the high attrition case (retention constant rjo = 0), 69% in the moderate 
attrition case (t]o = 0.5), 84% in the low attrition case (rjQ = 1.00), and 97% 
in the very low attrition case (f/o = 1-85). 

Figures [3] and [4] plot how well the GEE estimate of equation 2 per- 
formed for varying values of the homophily coefficients when the constant 
rjo = 1 and rjo = at the friendship retention stage (those for rjo = 0.5 
and 7/o = 1.85 are not plotted but are available in Table 1). These values 
approximately correspond to attrition rates in FHS (18%) and Add Health 
(50%), respectively. For visual clarity, the figures include bias and coverage 
levels for rj\ at the friendship formation stage of 0, 0.025, and 0.05. (When 
rji equals 0.0125 and 0.0375 at the friendship formation stage, the results 
fall between the three lines in Figures [3] and |4j They are thus omitted from 
the figures but described in Table 1.) 

First, Figure [3] presents the probability that the estimated 95% confi- 
dence interval covers the true contagion parameter of 0. When homophily 
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in friendship retention is 0, the confidence intervals for the CF models ac- 
curately bracket the true value approximately 95 percent of the time. How- 
ever, as homophily in friendship retention increases, coverage rates decline 
dramatically. For instance, when initial homophily is and friendship re- 
tention is high (the open triangles in the left panel of Figure |3), coverage 
falls to approximately 50 percent as homophily in friendship retention in- 
creases. In the most extreme cases, the confidence interval almost never in- 
cludes the true value of the influence coefficient. Even when friendship at- 
trition is very low (approximately 4%), Table 1 indicates that the CF model 
still has some coverage problems (decreasing as low as 81 percent) when 
homophily in friendship retention is high (see lines 46 to 60 of Table 1). In 
additional simulations, we find that coverage problems worsen further as 
sample size increases, thereby increasing the likelihood that the CF model 
will falsely reject the null hypothesis that the influence effect is (in prac- 
tice, CF's data frequently include far more than 1000 observations). 

As expected, coverage degrades because the model displays an upward 
bias, as is evident in Figure |3J which presents the mean value of the esti- 
mated peer effect (which has been set to in the simulations). When un- 
friending is not affected by homophily, the estimator is unbiased. But as ho- 
mophily in friendship retention increases, a correlation emerges in changes 
in the trait of interest between egos and alters, which the model interprets 
as evidence of social influence. As a result, estimated bias levels increase 
substantially — up to 0.10 in the worst case. Again, even in the left panel 
where friendship attrition is not large, the bias is substantial if friendship 
attrition is closely related to homophily. 

Is the level of bias described above meaningful? As a point of compari- 
son, we note that estimated peer effect coefficients for continuous variables 
in the literature are often in the range described in Table [I] and Figure |4j 
For example, the coefficient on alter's current BMI is 0.053 in the FHS data 
(SE=0.018) and 0.033 in Add Health data (SE=0.014) ( |Fowler and Chris- 
takis||2008fr[ )p1 Of course, our simulations cannot prove that the results in 

19 It would be worthwhile to repeat this exercise with a binary variable as in the CF studies 
of smoking or depression. 
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any given study are spurious, but they do suggest that the risk of falsely 
rejecting the null hypothesis is high for the CF model when homophily in 
friendship retention is present. 



6 Conclusion 

In this paper, we have argued that the "unfriending" problem complicates 
efforts to estimate causal influence effects in longitudinal social network 
data. While previous studies have demonstrated how homophily in friend- 
ship formation can confound estimates of peer effects, we are the first to 
demonstrate that homophily in friendship retention can create a similar 
problem by inducing an association between random shocks to an outcome 
variable for dyads that remain friends after the shock. We provide evidence 
for this hypothesis using an adaptation of Christakis and Fowler's Monte 
Carlo simulation. Our simulations show that, when friendship attrition 
is present, the CF model suffers from serious bias and coverage problems 
as homophily in friendship retention increases. This association is large 
enough that, under certain parameter values, it could account for most or 
even all of the observed associations in published estimates. 

These results suggest that caution is required in interpreting findings 
of peer effects that rely on maintained ties over time. Though this paper 
focuses on friendship network data, the logic of the "unfriending" prob- 
lem applies to estimates of peer effects in any longitudinal network data in 
which homophily plays a significant role in tie dissolution. For instance, 
Grannis ( 2010} discusses how inferences about network structure change 



if one allows for tie dissolution over time in the Ph.D. exchange network 
among sociology graduate programs. If there is significant homophily in 
tie retention in this network (for instance, by methodological or substantive 
focus), one might obtain misleading estimates of "influence" effects among 
graduate programs. 

Going forward, more research is clearly needed on models of peer influ- 



ence in observational data. For instance, the actor-based model of Steglich, 



Snijders and Pearson ( |2010[ > attempts to account for many of the concerns 
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described above. However, our simulation results suggest that it is essen- 
tial to test the properties of such models using simulations in which the 
true parameter values are known. 

In addition, more research is needed on how to assess models of peer in- 
fluence. While the parsimonious approach presented here has limitations, 
it provides a flexible framework for more complex simulations of friend- 
ship ties and social influence. Many desirable modifications are possible, 
including incorporating more than two time periods; allowing peer effects 
to vary by friendship duration; estimating the effect of latent traits; in- 
cluding environmental confounders; or allowing for correlations in shocks 
among peers. As research proceeds on models of network formation (e.g., 
Christakis et al.|20 10), it may soon also be possible to simulate random net- 
works that more closely mirror important features of human networks such 
as clustering, mutuality, and transitivity. As our theoretical understanding 
of networks improves, so too will our ability to test statistical models of 
social influence in observational data. 
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